A Local-Conscious Global Register Allocator for VLIW DSP Processors with Distributed Register Files

نویسندگان

  • Chia-Han Lu
  • Yung-Chia Lin
  • Yi-Ping You
  • Jenq-Kuen Lee
چکیده

Embedded processors developed in recent years have attempted to employ novel hardware design to reduce ever-growing complexity, power dissipation, and die area. While using a distributed register file architecture with irregular accessing constraints is considered to be an effective approach rather than traditional unified register file structures, conventional compilation techniques are not adequate to utilize such new register file organizations for optimal performance. This paper presents a novel scheme for register allocation which composes of global and local register allocation, on a VLIW DSP processor with distributed register files whose port access is highly restricted. In the scheme, a sub-phase prior to original global/local register allocation, named global/local RFA (register file assignment), is introduced to minimize various register file communication costs. For featured register file structure where each cluster contains heterogeneous register files, conventional register allocation scheme with cluster assignment only have to be enhanced to cope both inter-cluster and intra-cluster communications. Due to potential but heavy influences of global RFA on local RFA, a heuristic algorithm is proposed where global RFA manages to make suitable decisions on communication for local RFA. Experiments were done with a developing compiler based on the Open Research Compiler (ORC), and the results indicate that the compilation with the proposed approach delivers significant performance improvement, comparable to the solution using only the PALF scheme developed in our previous work.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Expression Rematerialization for VLIW DSP Processors with Distributed Register Files

Spill code is the overhead of memory load/store behavior if the available registers are not sufficient to map live ranges during the process of register allocation. Previously, works have been proposed to reduce spill code for the unified register file. For reducing power and cost in design of VLIW DSP processors, distributed register files and multibank register architectures are being adopted...

متن کامل

LC-GRFA: global register file assignment with local consciousness for VLIW DSP processors with non-uniform register files

Embedded processors developed within the past few years have employed novel hardware designs to reduce the ever-growing complexity, power dissipation, and die area. Although using a distributed register file architecture is considered to have less read/write ports than using traditional unified register file structures, it presents challenges in compilation techniques to generate efficient code...

متن کامل

Copy Propagation Optimizations for VLIW DSP Processors with Distributed Register Files

High-performance and low-power VLIW DSP processors are increasingly deployed on embedded devices to process video and multimedia applications. For reducing power and cost in designs of VLIW DSP processors, distributed register files and multi-bank register architectures are being adopted to eliminate the amount of read/write ports in register files. This presents new challenges for devising com...

متن کامل

LC-GRFA: global register file assignment with local consciousness for VLIW DSP processors with irregular register files

Embedded processors developed within the past few years have employed novel hardware designs to reduce the ever-growing complexity, power dissipation, and die area. While using a distributed register file architecture with irregular accessing constraints is considered to have less read/write ports than using traditional unified register file structures, conventional compilation techniques can n...

متن کامل

ORC2DSP: Compiler Infrastructure Supports for VLIW DSP Processors

In this paper, we describe our experiences in deploying ORC infrastructures for a novel 32-bit VLIW DSP processor (known as PAC core), which equips with new architectural features, such as distributed and ‘ping-pong’ register files. We also present methods in retargeting ORC compilers for PAC VLIW DSP processors. In addition, mechanisms are proposed to incorporate register allocation policies i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007